98 research outputs found

    Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions

    Get PDF
    This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed

    Selección de un etiquetador morfosintáctico primando la precisión en las categorías léxicas

    Get PDF
    In this article, four Part-of-Speech (PoS) taggers for Spanish are compared. The evaluation has been carried out without prior training or tuning of the PoS taggers. To allow for a comparison across PoS taggers, their tagsets have been mapped to the universal PoS tagset (Petrov, Das, and McDonald, 2012). The PoS taggers have also been compared as regards the information they provide and how they treat special features of the Spanish language such as verbal clitics and portmanteaux.En este artículo se comparan cuatro etiquetadores morfosintácticos para el español. La evaluación se ha realizado sin entrenamiento ni adaptación previa de los etiquetadores. Para poder realizar la comparación, los etiquetarios se han convertido al etiquetario universal (Petrov, Das, and McDonald, 2012). También se han comparado los etiquetadores en cuanto a la información que facilitan y cómo tratan características intrínsecas del idioma español como los clíticos verbales y las contracciones

    Dutch compound splitting for bilingual terminology extraction

    Get PDF
    Compounds pose a problem for applications that rely on precise word alignments such as bilingual terminology extraction. We therefore developed a state-of-the-art hybrid compound splitter for Dutch that makes use of corpus frequency information and linguistic knowledge. Domain-adaptation techniques are used to combine large out-of-domain and dynamically compiled in-domain frequency lists. We perform an extensive intrinsic evaluation on a Gold Standard set of 50,000 Dutch compounds and a set of 5,000 Dutch compounds belonging to the automotive domain. We also propose a novel methodology for word alignment that makes use of the compound splitter. As compounds are not always translated compositionally, we train the word alignment models twice: a first time on the original data set and a second time on the data set in which the compounds are split into their component parts. The obtained word alignment points are then combined

    Combining translation memories and syntax-based SMT: experiments with real industrial data

    Get PDF
    One major drawback of using Translation Memories (TMs) in phrase-based Machine Translation (MT) is that only continuous phrases are considered. In contrast, syntax-based MT allows phrasal discontinuity by learning translation rules containing non-terminals. In this paper, we combine a TM with syntax-based MT via sparse features. These features are extracted during decoding based on translation rules and their corresponding patterns in the TM. We have tested this approach by carrying out experiments on real English–Spanish industrial data. Our results show that these TM features significantly improve syntax-based MT. Our final system yields improvements of up to +3.1 BLEU, +1.6 METEOR, and -2.6 TER when compared with a stateof-the-art phrase-based MT system

    The Harvesting Day: an initiative to enhance the visibility of language resources

    Get PDF
    The Harvesting Day es una iniciativa para garantizar la visibilidad, localización y descripción de los recursos lingüísticos mediante un conjunto básico de metadatos. Esta iniciativa aboga por un cambio de estrategia en el que los proveedores de recursos y tecnologías lingüísticos se convierten en responsables de la visibilidad de sus propios recursos así como de su documentación. Una vez creadas y almacenadas debidamente las descripciones de los diferentes recursos, los metadatos son recopilados de manera automática y periódica y se envían a los principales repositorios y catálogos virtuales garantizando así la visibilidad de los recursos así como la veracidad de sus datos, que de este modo se mantendrán actualizados.The Harvesting Day is an initiative to ensure the visibility, accessibility and description of language resources by means of a basic and metadata schema. This initiative believes in a change of strategy: resource and technology providers must be aware of the importance of ensuring the visibility of their resources, as well as the documentation thereof. Once language resources descriptions are appropriately created and saved, the corresponding metadata are automatically and periodically harvested and sent to the main virtual repositories and catalogues. This guarantees not only the visibility of language resources and technologies, but also the trustability of their data, which in turn is continuously updated.Ministerio de Ciencia e Innovación; Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya

    Machine translation as an academic writing aid for medical practitioners

    Get PDF
    In this paper we explore the utility of Machine Translation as a writing aid and its impact on the quality of the text produced. We focus on medical practitioners who are native speakers of Spanish and who need to publish their scientific work in English as a foreign language. After carrying out a general survey to determine whether Spanish-speaking medical practitioners already use MT as a writing aid, we engaged five participants in an experiment where we asked them to write a paper in Spanish that was subsequently machine translated. They were then asked to post-edit the MT output. We analyse their post-edits and further attempt to evaluate the overall quality of their texts by engaging a professional proofreader. Our results suggest that the texts produced with the help of MT+post-editing still require many edits in order to be considered of acceptable quality. In the conclusion, we identify several avenues worthy of future investigation and that could help achieve better quality

    Evaluación de los hábitos dietéticos y niveles de actividad física en adolescentes escolares: Un estudio transversal

    Get PDF
    Introduction: Diet and physical activity are the two main modifiable risk factors to prevent and/or control overweight and obesity in pediatric stages. The objective of this study was to assess the lifestyles (diet and physical activity), and its association with Body Mass Index (BMI) among adolescents.Material and methods: We carried out a cross-sectional study on lifestyles among adolescents. We collected sociodemographic information, clinical data, anthropometric measures, diet and physical activity habits. A Multiple Linear Regression was used to assess the association between lifestyles and BMI, adjusted by potential confounders: sex, age, hours of sleep and smoking.Results: The population of this study consists of 129 teenagers (51,94% male). Their mean age is of 14.88. The excess weight prevalence was of 32.80%, and it was greater in the male portion of the population than its female counterpart. 59.70% of the adolescents were confirmed to follow an appropriate diet. 71.10% follow the WHO recommendations related to physical activities. In this case, the male portion of the population proved to be more physically active than the female counterpart. The adolescents with an excess weight obtained a lower punctuation with regard to the quality of their diet and turned out to engage in less physical activities than those presenting a normal weight. Moreover, a higher punctuation in the quality of diet (p-value 0.013), number of sleep hours (p-value 0.032) and being female (p-value <0.001), decrease the BMI.Conclusion: We observed a high prevalence of adolescents with excess weight and a lower quality diet as well as lower levels of physical activity. Finally, we also identified an association between the lifestyle and BMI among adolescents.Introducción: La alimentación y la actividad física son los principales factores de riesgo modificables para la prevención y/o control del sobrepeso y la obesidad en etapas pediátricas. Los objetivos de este estudio fueron evaluar los estilos de vida (dietéticos y actividad física), y su asociación con el Índice de Masa Corporal (IMC) en adolescentes.Material y métodos: Se realizó un estudio de tipo transversal sobre estilos de vida en adolescentes escolares. Se recogieron características sociodemográficas, datos clínicos, medidas antropométricas, hábitos dietéticos y de actividad física. Se utilizó una Regresión Lineal Múltiple para valorar la asociación entre los estilos de vida y el IMC, ajustados a posibles factores de confusión: sexo, edad, horas de sueño y tabaquismo.Resultados: Se analizaron 129 adolescentes (51,94% chicos) con una media de edad de 14,88 años. La prevalencia de exceso de peso fue del 32,80%, siendo mayor en los chicos comparados con las chicas. Un 59,70% de los adolescentes presentaron una dieta adecuada, y un 71,10% cumplen con las recomendaciones de actividad física de la OMS, siendo los chicos los que realizan mayor actividad física comparado con las chicas. Los adolescentes con exceso de peso obtuvieron menor puntuación de la calidad de la dieta y realizan menor actividad física que los normopeso. Además, el incremento en la puntuación de la calidad de la dieta (p-valor 0,013), horas de sueño al día (p-valor 0,032) y ser chica (p-valor <0,001), disminuyen el IMC.Conclusión: Observamos una prevalencia amplia de adolescentes con exceso de peso; y una menor calidad de la dieta y actividad física en este grupo. Además, existe una asociación entre los estilos de vida y el IMC en los adolescentes

    Ethics Recommendations for Crisis Translation Settings

    Get PDF
    This document is a summary public version of the Ethics Recommendations for Crisis Translation Settings produced by some of the INTERACT project team. INTERACT is the International Network in Crisis Translation, a project funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 734211. Further information about the project as a whole is available at: https://sites.google.com/view/crisistranslation/hom

    The first Automatic Translation Memory Cleaning Shared Task

    Get PDF
    This is an accepted manuscript of an article published by Springer in Machine Translation on 21/01/2017, available online: https://doi.org/10.1007/s10590-016-9183-x The accepted version of the publication may differ from the final published version.This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys
    corecore